Pesquisa | Portal Regional da BVS

1.

Getting the bugs out of AI: Advancing ecological research on arthropods through computer vision.

Schneider, Stefan; Taylor, Graham W; Kremer, Stefan C; Fryxell, John M.

Ecol Lett ; 26(7): 1247-1258, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37216316

RESUMO

Deep learning for computer vision has shown promising results in the field of entomology, however, there still remains untapped potential. Deep learning performance is enabled primarily by large quantities of annotated data which, outside of rare circumstances, are limited in ecological studies. Currently, to utilize deep learning systems, ecologists undergo extensive data collection efforts, or limit their problem to niche tasks. These solutions do not scale to region agnostic models. However, there are solutions that employ data augmentation, simulators, generative models, and self-supervised learning that can supplement limited labelled data. Here, we highlight the success of deep learning for computer vision within entomology, discuss data collection efforts, provide methodologies for optimizing learning from limited annotations, and conclude with practical guidelines for how to achieve a foundation model for entomology capable of accessible automated ecological monitoring on a global scale.

Assuntos

Artrópodes , Animais , Computadores

2.

Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta Diversity in Medically Relevant 16S Amplicon Sequencing Data.

Rudar, Josip; Golding, G Brian; Kremer, Stefan C; Hajibabaei, Mehrdad.

Microbiol Spectr ; : e0206522, 2023 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-36877086

RESUMO

Developing an understanding of how microbial communities vary across conditions is an important analytical step. We used 16S rRNA data isolated from human stool samples to investigate whether learned dissimilarities, such as those produced using unsupervised decision tree ensembles, can be used to improve the analysis of the composition of bacterial communities in patients suffering from Crohn's disease and adenomas/colorectal cancers. We also introduce a workflow capable of learning dissimilarities, projecting them into a lower dimensional space, and identifying features that impact the location of samples in the projections. For example, when used with the centered log ratio transformation, our new workflow (TreeOrdination) could identify differences in the microbial communities of Crohn's disease patients and healthy controls. Further investigation of our models elucidated the global impact amplicon sequence variants (ASVs) had on the locations of samples in the projected space and how each ASV impacted individual samples in this space. Furthermore, this approach can be used to integrate patient data easily into the model and results in models that generalize well to unseen data. Models employing multivariate splits can improve the analysis of complex high-throughput sequencing data sets because they are better able to learn about the underlying structure of the data set. IMPORTANCE There is an ever-increasing level of interest in accurately modeling and understanding the roles that commensal organisms play in human health and disease. We show that learned representations can be used to create informative ordinations. We also demonstrate that the application of modern model introspection algorithms can be used to investigate and quantify the impacts of taxa in these ordinations, and that the taxa identified by these approaches have been associated with immune-mediated inflammatory diseases and colorectal cancer.

3.

Long-term TE persistence even without beneficial insertion.

Kremer, Stefan C; Linquist, Stefan; Saylor, Brent; Elliott, Tyler A; Gregory, T Ryan; Cottenie, Karl.

BMC Genomics ; 22(1): 260, 2021 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-33845764

RESUMO

This correspondence responds to the critique by Butler et al. (BMC Genomics 22:241, 2021) of our recent paper on transposable element (TE) persistence. We address the three main objections raised by Butler et al. After running a series of additional simulations that were inspired by the authors' criticisms, we are able to present a more nuanced understanding of the conditions that generate long-term persistence.

Assuntos

Elementos de DNA Transponíveis , Elementos de DNA Transponíveis/genética

4.

Transposable element persistence via potential genome-level ecosystem engineering.

Kremer, Stefan C; Linquist, Stefan; Saylor, Brent; Elliott, Tyler A; Gregory, T Ryan; Cottenie, Karl.

BMC Genomics ; 21(1): 367, 2020 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-32429843

RESUMO

BACKGROUND: The nuclear genomes of eukaryotes vary enormously in size, with much of this variability attributable to differential accumulation of transposable elements (TEs). To date, the precise evolutionary and ecological conditions influencing TE accumulation remain poorly understood. Most previous attempts to identify these conditions have focused on evolutionary processes occurring at the host organism level, whereas we explore a TE ecology explanation. RESULTS: As an alternative (or additional) hypothesis, we propose that ecological mechanisms occurring within the host cell may contribute to patterns of TE accumulation. To test this idea, we conducted a series of experiments using a simulated asexual TE/host system. Each experiment tracked the accumulation rate for a given type of TE within a particular host genome. TEs in this system had a net deleterious effect on host fitness, which did not change over the course of experiments. As one might expect, in the majority of experiments TEs were either purged from the genome or drove the host population to extinction. However, in an intriguing handful of cases, TEs co-existed with hosts and accumulated to very large numbers. This tended to occur when TEs achieved a stable density relative to non-TE sequences in the genome (as opposed to reaching any particular absolute number). In our model, the only way to maintain a stable density was for TEs to generate new, inactive copies at a rate that balanced with the production of active (replicating) copies. CONCLUSIONS: From a TE ecology perspective, we suggest this could be interpreted as a case of ecosystem engineering within the genome, where TEs persist by creating their own "habitat".

Assuntos

Elementos de DNA Transponíveis/fisiologia , Ecossistema , Genoma , Modelos Genéticos , Coevolução Biológica , Elementos de DNA Transponíveis/genética , Eucariotos/genética , Evolução Molecular , Aptidão Genética , Instabilidade Genômica

5.

Three critical factors affecting automated image species recognition performance for camera traps.

Schneider, Stefan; Greenberg, Saul; Taylor, Graham W; Kremer, Stefan C.

Ecol Evol ; 10(7): 3503-3517, 2020 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-32274005

RESUMO

Ecological camera traps are increasingly used by wildlife biologists to unobtrusively monitor an ecosystems animal population. However, manual inspection of the images produced is expensive, laborious, and time-consuming. The success of deep learning systems using camera trap images has been previously explored in preliminary stages. These studies, however, are lacking in their practicality. They are primarily focused on extremely large datasets, often millions of images, and there is little to no focus on performance when tasked with species identification in new locations not seen during training. Our goal was to test the capabilities of deep learning systems trained on camera trap images using modestly sized training data, compare performance when considering unseen background locations, and quantify the gradient of lower bound performance to provide a guideline of data requirements in correspondence to performance expectations. We use a dataset provided by Parks Canada containing 47,279 images collected from 36 unique geographic locations across multiple environments. Images represent 55 animal species and human activity with high-class imbalance. We trained, tested, and compared the capabilities of six deep learning computer vision networks using transfer learning and image augmentation: DenseNet201, Inception-ResNet-V3, InceptionV3, NASNetMobile, MobileNetV2, and Xception. We compare overall performance on "trained" locations where DenseNet201 performed best with 95.6% top-1 accuracy showing promise for deep learning methods for smaller scale research efforts. Using trained locations, classifications with <500 images had low and highly variable recall of 0.750 ± 0.329, while classifications with over 1,000 images had a high and stable recall of 0.971 ± 0.0137. Models tasked with classifying species from untrained locations were less accurate, with DenseNet201 performing best with 68.7% top-1 accuracy. Finally, we provide an open repository where ecologists can insert their image data to train and test custom species detection models for their desired ecological domain.

6.

Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C.

IEEE/ACM Trans Comput Biol Bioinform ; 15(1): 191-204, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-28092571

RESUMO

This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).

Assuntos

Código de Barras de DNA Taxonômico/métodos , Genômica/métodos , Aprendizado de Máquina , Animais , Humanos , Redes Neurais de Computação

7.

Yes! There are resilient generalizations (or "laws") in ecology.

Linquist, Stefan; Gregory, T Ryan; Elliott, Tyler A; Saylor, Brent; Kremer, Stefan C; Cottenie, Karl.

Q Rev Biol ; 91(2): 119-31, 2016 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-27405221

RESUMO

ABSTRACT It is often argued that ecological communities admit of no useful generalizations or "laws" because these systems are especially prone to contingent historical events. Detractors respond that this argument assumes an overly stringent definition of laws of nature. Under a more relaxed conception, it is argued that ecological laws emerge at the level of communities and elsewhere. A brief review of this debate reveals an issue with deep philosophical roots that is unlikely to be resolved by a better understanding of generalizations in ecology. We therefore propose a strategy for transforming the conceptual question about the nature of ecological laws into a set of empirically tractable hypotheses about the relative re- silience of ecological generalizations across three dimensions: taxonomy, habitat type, and scale. These hypotheses are tested using a survey of 240 meta-analyses in ecology. Our central finding is that generalizations in community ecology are just as prevalent and as resilient as those in population or ecosystem ecology. These findings should help to establish community ecology as a generality-seeking science as opposed to a science of case studies. It also supports the capacity for ecologists, working at any of the three levels, to inform matters of public policy.

Assuntos

Ecologia , Filosofia , Animais , Fenômenos Ecológicos e Ambientais , Humanos , Modelos Biológicos

8.

Prediction of Protein Coding Regions Using a Wide-Range Wavelet Window Method.

Marhon, Sajid A; Kremer, Stefan C.

IEEE/ACM Trans Comput Biol Bioinform ; 13(4): 742-53, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26415183

RESUMO

Prediction of protein coding regions is an important topic in the field of genomic sequence analysis. Several spectrum-based techniques for the prediction of protein coding regions have been proposed. However, the outstanding issue in most of the proposed techniques is that these techniques depend on an experimentally-selected, predefined value of the window length. In this paper, we propose a new Wide-Range Wavelet Window (WRWW) method for the prediction of protein coding regions. The analysis of the proposed wavelet window shows that its frequency response can adapt its width to accommodate the change in the window length so that it can allow or prevent frequencies other than the basic frequency in the analysis of DNA sequences. This feature makes the proposed window capable of analyzing DNA sequences with a wide range of the window lengths without degradation in the performance. The experimental analysis of applying the WRWW method and other spectrum-based methods to five benchmark datasets has shown that the proposed method outperforms other methods along a wide range of the window lengths. In addition, the experimental analysis has shown that the proposed method is dominant in the prediction of both short and long exons.

Assuntos

Genômica/métodos , Fases de Leitura Aberta/genética , Análise de Sequência de DNA/métodos , Modelos Estatísticos , Proteínas/genética , Análise de Ondaletas

9.

Applying ecological models to communities of genetic elements: the case of neutral theory.

Linquist, Stefan; Cottenie, Karl; Elliott, Tyler A; Saylor, Brent; Kremer, Stefan C; Gregory, T Ryan.

Mol Ecol ; 24(13): 3232-42, 2015 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-25919906

RESUMO

A promising recent development in molecular biology involves viewing the genome as a mini-ecosystem, where genetic elements are compared to organisms and the surrounding cellular and genomic structures are regarded as the local environment. Here, we critically evaluate the prospects of ecological neutral theory (ENT), a popular model in ecology, as it applies at the genomic level. This assessment requires an overview of the controversy surrounding neutral models in community ecology. In particular, we discuss the limitations of using ENT both as an explanation of community dynamics and as a null hypothesis. We then analyse a case study in which ENT has been applied to genomic data. Our central finding is that genetic elements do not conform to the requirements of ENT once its assumptions and limitations are made explicit. We further compare this genome-level application of ENT to two other, more familiar approaches in genomics that rely on neutral mechanisms: Kimura's molecular neutral theory and Lynch's mutational-hazard model. Interestingly, this comparison reveals that there are two distinct concepts of neutrality associated with these models, which we dub 'fitness neutrality' and 'competitive neutrality'. This distinction helps to clarify the various roles for neutral models in genomics, for example in explaining the evolution of genome size.

Assuntos

Biodiversidade , Ecologia/métodos , Genômica/métodos , Modelos Biológicos , Mutação

10.

A novel application of ecological analyses to assess transposable element distributions in the genome of the domestic cow, Bos taurus.

Saylor, Brent; Elliott, Tyler A; Linquist, Stefan; Kremer, Stefan C; Gregory, T Ryan; Cottenie, Karl.

Genome ; 56(9): 521-33, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-24168673

RESUMO

Transposable elements (TEs) are among the most abundant components of many eukaryotic genomes. Efforts to explain TE abundance, as well as TE diversity among genomes, have led some researchers to draw an analogy between genomic and ecological processes. Adopting this perspective, we conducted an analysis of the cow (Bos taurus) genome using techniques developed by community ecologists to determine whether environmental factors influence community composition. Specifically, each chromosome within the Bos taurus genome was treated as a "linear transect", and a multivariate redundancy analysis (RDA) was used to identify large-scale spatial patterns in TE communities associated with 10 TE families. The position of each TE community on the chromosome accounted for â¼50% of the variation along the chromosome "transect". Multivariate analysis further revealed an effect of gene density on TE communities that is influenced by several other factors in the (genomic) environment, including chromosome length and TE density. The results of this analysis demonstrate that ecological methods can be applied successfully to help answer genomic questions.

Assuntos

Bovinos/genética , Elementos de DNA Transponíveis , Genoma , Animais , Cromossomos de Mamíferos/genética , Ecossistema , Análise Multivariada , Dinâmica Populacional , Análise Espacial

11.

Distinguishing ecological from evolutionary approaches to transposable elements.

Linquist, Stefan; Saylor, Brent; Cottenie, Karl; Elliott, Tyler A; Kremer, Stefan C; Gregory, T Ryan.

Biol Rev Camb Philos Soc ; 88(3): 573-84, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-23347261

RESUMO

Considerable variation exists not only in the kinds of transposable elements (TEs) occurring within the genomes of different species, but also in their abundance and distribution. Noting a similarity to the assortment of organisms among ecosystems, some researchers have called for an ecological approach to the study of transposon dynamics. However, there are several ways to adopt such an approach, and it is sometimes unclear what an ecological perspective will add to the existing co-evolutionary framework for explaining transposon-host interactions. This review aims to clarify the conceptual foundations of transposon ecology in order to evaluate its explanatory prospects. We begin by identifying three unanswered questions regarding the abundance and distribution of TEs that potentially call for an ecological explanation. We then offer an operational distinction between evolutionary and ecological approaches to these questions. By determining the amount of variance in transposon abundance and distribution that is explained by ecological and evolutionary factors, respectively, it is possible empirically to assess the prospects for each of these explanatory frameworks. To illustrate how this methodology applies to a concrete example, we analyzed whole-genome data for one set of distantly related mammals and another more closely related group of arthropods. Our expectation was that ecological factors are most informative for explaining differences among individual TE lineages, rather than TE families, and for explaining their distribution among closely related as opposed to distantly related host genomes. We found that, in these data sets, ecological factors do in fact explain most of the variation in TE abundance and distribution among TE lineages across less distantly related host organisms. Evolutionary factors were not significant at these levels. However, the explanatory roles of evolution and ecology become inverted at the level of TE families or among more distantly related genomes. Not only does this example demonstrate the utility of our distinction between ecological and evolutionary perspectives, it further suggests an appropriate explanatory domain for the burgeoning discipline of transposon ecology. The fact that ecological processes appear to be impacting TE lineages over relatively short time scales further raises the possibility that transposons might serve as useful model systems for testing more general hypotheses in ecology.

Assuntos

Adaptação Fisiológica/genética , Evolução Biológica , Elementos de DNA Transponíveis/fisiologia , Ecossistema , Regulação da Expressão Gênica/fisiologia , Animais

12.

Gene prediction based on DNA spectral analysis: a literature review.

Marhon, Sajid A; Kremer, Stefan C.

J Comput Biol ; 18(4): 639-76, 2011 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-21381961

RESUMO

The identification of regions of DNA sequences that code for proteins is one of the most fundamental applications in bioinformatics. These protein-coding regions are in contrast to other DNA regions that encode functional RNA molecules, provide structural stability of chromosomes, serve as genetic raw materials, represent molecular fossils, or have no known purpose (sometimes called "junk DNA"). A number of approaches have been suggested for differentiating between the protein-coding and non-protein-coding regions of DNA. A selection of these approaches is based on digital signal processing (DSP) techniques. These DSP techniques rely on the phenomenon that protein-coding regions have a prominent power spectrum peak at frequency f=â arising from the length of codons (three nucleic acids). This article partitions the identification of protein-coding regions into four discrete steps. Based on this partitioning, DSP techniques can be easily described and compared based on their unique implementations of the processing steps. We compare the approaches, and discuss strengths and weaknesses of each in the context of different applications. Our work provides an accessible introduction and comparative review of DSP methods for the identification of protein-coding regions. Additionally, by breaking down the approaches into four steps, we suggest new combinations that may be worthy of future study.

Assuntos

DNA/genética , Genômica/métodos , Fases de Leitura Aberta , Análise de Sequência de DNA/métodos , Animais , Humanos , Modelos Genéticos

13.

Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization.

Ma, Eddie Y T; Cameron, Christopher J F; Kremer, Stefan C.

BMC Bioinformatics ; 11 Suppl 8: S4, 2010 Oct 26.

Artigo em Inglês | MEDLINE | ID: mdl-21034429

RESUMO

UNLABELLED: This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. BACKGROUND: Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. RESULTS: Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. CONCLUSIONS: We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.

Assuntos

Inteligência Artificial , Biologia Computacional/métodos , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Alcaloides , Animais , Camundongos , Proteínas/classificação , Relação Quantitativa Estrutura-Atividade , Ratos , Análise de Regressão , Reprodutibilidade dos Testes , Software

14.

Theoretical justification of computing the 3-base periodicity using nucleotide distribution variance.

Marhon, Sajid; Kremer, Stefan C.

Biosystems ; 101(3): 185-6, 2010 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-20633601

RESUMO

In a previous paper (Yin and Yau, 2005), a novel method was proposed to measure the power spectrum of a DNA sequence at frequency N/3 in order to distinguish protein-coding and non-coding regions in DNA sequences. This was accomplished by computing the distribution of the four nucleotides in the three reading frames (codon positions) and identifying variance as an indicator of 3-base periodicity. That work included an empirical justification for the claim that there exists a linear, 3:2 correlation between the variance and the power spectrum. In this note, we provide a theoretical justification for that observation in the form of a mathematical proof of this correlation. This work thus provides a more rigorous justification for the use of the variance instead of the more computationally expensive power spectrum, allowing users of this technique to apply it with absolute confidence that no compromise in accuracy is incurred.

Assuntos

Composição de Bases/genética , Códon/genética , Biologia Computacional/métodos , Modelos Genéticos , Nucleotídeos/genética , Fases de Leitura Aberta/genética , Análise de Variância , Análise de Fourier

15.

A taxonomy for spatiotemporal connectionist networks revisited: the unsupervised case.

Barreto, Guilherme de A; Araújo, Aluizio F R; Kremer, Stefan C.

Neural Comput ; 15(6): 1255-320, 2003 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-12816574

RESUMO

Spatiotemporal connectionist networks (STCNs) comprise an important class of neural models that can deal with patterns distributed in both time and space. In this article, we widen the application domain of the taxonomy for supervised STCNs recently proposed by Kremer (2001) to the unsupervised case. This is possible through a reinterpretation of the state vector as a vector of latent (hidden) variables, as proposed by Meinicke (2000). The goal of this generalized taxonomy is then to provide a nonlinear generative framework for describing unsupervised spatiotemporal networks, making it easier to compare and contrast their representational and operational characteristics. Computational properties, representational issues, and learning are also discussed, and a number of references to the relevant source publications are provided. It is argued that the proposed approach is simple and more powerful than the previous attempts from a descriptive and predictive viewpoint. We also discuss the relation of this taxonomy with automata theory and state-space modeling and suggest directions for further work.

Assuntos

Modelos Neurológicos , Vias Neurais , Classificação , Neurônios

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA